Added support for compression on meta device #376

shanjiaz · 2025-07-02T13:57:48Z

Summary:
This PR added model compression/decompression support to handle models instantiated on the meta device. Updated downstream dependencies as well.Specifically,

Sparse24BitMaskCompressor:

Updated compress_weight signature to accept optional module argument for meta support.
When compressing meta tensors, creates empty meta placeholders for compressed weights rather than performing CPU operations.
Updated sparse compression helper functions to support meta device tensors gracefully.

Quantized Compressors:

Refined quantization compression to support saving compressed tensors on meta device.
Removed Numpy logic.

Test:
Tested with pytest tests/test_compressors in compressed_tensor and pytest tests/quantization/compressed_tensors_integration/ in transformer and all tests passed.

skipped tests in llm-compressor now pass. Will recover them after the transformer PR merges.

Compressing model: 293it [00:00, 3941.09it/s]test_run_compressed.py::Test_Decompressed_Linear_Uncompressed_Linear_0_commit::test_compressed_matches_decompressed 
Decompressing model: 293it [00:00, 337.60it/s]
`run_compressed` is only supported for compressed models. Setting `run_compressed=False`
2025-07-07T12:47:02.525662-0400 | reset | INFO - Compression lifecycle reset
PASSED2025-07-07T12:47:13.135942-0400 | reset | INFO - Compression lifecycle reset

Compressing model: 293it [00:00, 6702.91it/s]test_run_compressed.py::Test_Compressed_CompressedLinear_Decompressed_Linear_0_commit::test_compressed_linear_modules_exist 
Compressing model: 293it [00:00, 7316.63it/s]
Decompressing model: 293it [00:00, 411.03it/s]
2025-07-07T12:47:15.307218-0400 | reset | INFO - Compression lifecycle reset
PASSED2025-07-07T12:47:15.309934-0400 | reset | INFO - Compression lifecycle reset

tests/llmcompressor/transformers/compression/test_run_compressed.py::Test_Compressed_CompressedLinear_Decompressed_Linear_0_commit::test_compressed_matches_decompressed__hf_quantizer 2025-07-07T12:47:15.310584-0400 | reset | INFO - Compression lifecycle reset
PASSED2025-07-07T12:47:25.401371-0400 | reset | INFO - Compression lifecycle reset

Compressing model: 293it [00:00, 2849.25it/s]test_decompress.py::TestDecompression_0_commit::test_hf_quantizer_decompress_match_manual_decompress 
Decompressing model: 293it [00:00, 334.66it/s]
Decompressing model: 154it [00:00, 161.66it/s]
2025-07-07T12:47:48.791590-0400 | reset | INFO - Compression lifecycle reset
PASSED2025-07-07T12:48:09.605394-0400 | reset | INFO - Compression lifecycle reset

Nightly & e2e all pass as well : )

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

src/compressed_tensors/compressors/model_compressors/model_compressor.py

src/compressed_tensors/compressors/quantized_compressors/base.py

src/compressed_tensors/compressors/quantized_compressors/pack_quantized.py

src/compressed_tensors/compressors/sparse_compressors/sparse_24_bitmask.py

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

dsikka

Please make sure the test cases in folders ending in _skipped now pass: https://github.com/vllm-project/llm-compressor/tree/main/tests/llmcompressor/transformers/compression

brian-dellabetta

Looking good! One comment and another suggestion applied to 3 different lines

src/compressed_tensors/compressors/sparse_compressors/sparse_24_bitmask.py

src/compressed_tensors/compressors/model_compressors/model_compressor.py

…4_bitmask.py Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

shanjiaz · 2025-07-07T16:57:00Z

Please make sure the test cases in folders ending in _skipped now pass: https://github.com/vllm-project/llm-compressor/tree/main/tests/llmcompressor/transformers/compression

They're passing now. Logs are pasted in the PR description. 🫡

kylesayrs

I don't really understand the changes to pack_to_int32. The original function doesn't actually use any explicit numpy calls (outside of start and end), so I don't see why the original function wouldn't work with meta tensors?

Adding some tests for compressing meta models as well as using the compressors with is_meta=True would help with this

Also, are the changes from here required?

src/compressed_tensors/compressors/quantized_compressors/pack_quantized.py

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

kylesayrs

Spoke more with @shanjiaz and clarified some things. Tests are correct and passing, and the logic looks correct, nice job!

rahul-tuli

dsikka

nice job

shanjiaz added 2 commits July 2, 2025 09:56

added support for compression on meta device

6696810

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

remove breakpoint

f36f550

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

shanjiaz marked this pull request as ready for review July 2, 2025 16:10

remove comment

be74a47

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

kylesayrs mentioned this pull request Jul 3, 2025

W8A8 quantization FROZEN model loading error + A8 activation quantization implementation #378

Closed

kylesayrs reviewed Jul 3, 2025

View reviewed changes

shanjiaz added 3 commits July 3, 2025 09:44

address reviewed issues

1e71b0a

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

fix style

fe9c1e6

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

new line

18aebfc

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

shanjiaz requested review from kylesayrs, rahul-tuli, brian-dellabetta and dsikka July 7, 2025 12:53

dsikka requested changes Jul 7, 2025

View reviewed changes

brian-dellabetta reviewed Jul 7, 2025

View reviewed changes

kylesayrs reviewed Jul 7, 2025

View reviewed changes

src/compressed_tensors/compressors/model_compressors/model_compressor.py Outdated Show resolved Hide resolved

shanjiaz and others added 4 commits July 7, 2025 10:40

Update src/compressed_tensors/compressors/sparse_compressors/sparse_2…

e8e7a7d

…4_bitmask.py Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>

Update src/compressed_tensors/compressors/sparse_compressors/sparse_2…

4ea589b

…4_bitmask.py Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>

Update src/compressed_tensors/compressors/sparse_compressors/sparse_2…

4a5f064

…4_bitmask.py Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>

Added docstring

53b63b1

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

shanjiaz requested review from dsikka, brian-dellabetta and kylesayrs July 7, 2025 16:57

brian-dellabetta previously approved these changes Jul 7, 2025

View reviewed changes

kylesayrs requested changes Jul 7, 2025

View reviewed changes

src/compressed_tensors/compressors/quantized_compressors/pack_quantized.py Show resolved Hide resolved

removed is_meta input

0916ca5

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

shanjiaz dismissed brian-dellabetta’s stale review via 0916ca5 July 7, 2025 18:29

added test

f884e3e

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

kylesayrs approved these changes Jul 9, 2025

View reviewed changes

rahul-tuli approved these changes Jul 9, 2025

View reviewed changes

shanjiaz enabled auto-merge (squash) July 9, 2025 18:18

dsikka reviewed Jul 9, 2025

View reviewed changes

dsikka approved these changes Jul 9, 2025

View reviewed changes

shanjiaz merged commit 8f67b97 into main Jul 9, 2025
1 check passed

shanjiaz deleted the hz-fix-int8-decompression branch July 9, 2025 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added support for compression on meta device #376

Added support for compression on meta device #376

Uh oh!

shanjiaz commented Jul 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shanjiaz commented Jul 7, 2025

Uh oh!

kylesayrs left a comment •

edited by shanjiaz

Loading

Uh oh!

Uh oh!

kylesayrs left a comment

Uh oh!

rahul-tuli left a comment

Uh oh!

dsikka left a comment

Uh oh!

Uh oh!

Uh oh!

Added support for compression on meta device #376

Added support for compression on meta device #376

Uh oh!

Conversation

shanjiaz commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shanjiaz commented Jul 7, 2025

Uh oh!

kylesayrs left a comment • edited by shanjiaz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

rahul-tuli left a comment

Choose a reason for hiding this comment

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shanjiaz commented Jul 2, 2025 •

edited

Loading

kylesayrs left a comment •

edited by shanjiaz

Loading